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FOREWORD 


This  report  presents  the  results  of  a  study  of  the  specifications 
for  an  information  system  intended  to  support  the  design,  production 
and  maintenance  of  large  computer  programming  systems.  Called 
Evolutionary  System  for  Data  Processing,  or  ESDP,  it  was  begun  as  an 
internal  IBM  project  in  1965  by  the  Center  for  Exploratory  Studies 
of  the  Federal  Systems  Division  and  continued  under  Air  Force 
sponsorship  during  1967  and  early  1968. 

This  work  has  been  performed  under  contract  number  F1962S-67- 
C0254  for  the  Electronic  Systems  Division,  U.S.  Air  Force  Systems 
Command.  The  project  monitor  was  Mr.  John  Goodenough,  ESLFE. 

The  authors  wish  to  express  their  appreciation  For  the  encourage¬ 
ment  and  assistance  provided  by  Dr.  John  Egan,  formerly  of  ESD,  and 
their  colleagues  Dr.  Harlan  D.  Mills  and  Mr.  Michael  Dyer. 

This  report  is  in  four  volumes:  Volume  1,  System  Description; 

Volume  2,  Control  and  Use  of  the  System;  Volume  3,  The  CAINT  Executive 
Language  and  Instruction  Generator;  and  Volume  4,  Programming  Specifica¬ 
tions.  This  report  was  submitted  on  January  31,  1968. 

This  report  has  been  reviewed  and  is  approved. 


WILLIAM  F.  HEISLER,  Col,  USAF 
Chief,  Command  Systems  Division 


Project  Officer 
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ABSTRACT 


ECDP  is  a  proposed  system  whose  purpose  is  to  acquire, 
store,  retrieve,  nuhlish  and  disseminate  all  documentation, 
exclusive  of  graphics,  concerned  with  a  large  computer 

proaramminn  activity.  Documentation  is  deemed  to  consist,  not 
only  of  ^inal  or  formally  published  after-the-fact  reports,  but 
of  workin a  files,  desian  and  change  notices,  informal  drafts, 
management  reports--in  fact,  the  entire  recordable  rationale 
underlying  a  programming  svstem.  Maximum  attention  has  been 

concentrated  on  the  means  of  acquiring  and  organizing 

documentation.  Two  major,  complementary  approaches  are  proposed, 
the  first  is  called  Program  Analysis  and  is  a  process  of 
documentation  directly  from  completed  programs.  The 
second  is  called  Computer  Assisted  Interrogation  and  is  a  process 
of  eliciting  information  directly  ^rom  the  programmer,  through 
on-line  communication  terminals.  The  former  provides  canonical 
data  about  the  nroqram's  structure.  The  latter  provides 
explanatory  material  about  all  aspects  o^  the  program,  and  in  the 
absence  or  canonical  data,  may  provide  tentative  structural 
information  is  well.  The  conclusion  or  fhn  study  group  is  that 
F.SDn  is  a  feasible  concept  with  prosent-dav  technology  and  that 
it  will  materially  benefit  using  organizations  in  the  production 
of  programs  and  in  guiding  their  evolution  as  requirements 
change.  Its  value  will  he  greater  for  larger  organizations, 
whosp  internal  communiea t ions  difficulties  tend  to  cause  truly 
aiianfic  inefficiencies.  Tts  implementation  as  a  support  system 
for  such  projects  voul  1  require  a  siinificant  quantum  of 
investment  in  order  to  produce  these  benefits  and  is  predicated 
on  the  i iso  of  a  computer  system  delicate  1  solely  to  the  use  of 
ESDP. 
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SYSTEM  DESCRIPTION 


1.  Genera  1  Approach  to  Proqraam inq.  The  general  architecture 
proposed  for  the  ESDP  system  is  that  used  for  Operating 
Sys tem/360-9ueued  Telecommunication  Access  Method  (OS-QTAM)  (see 
Figure  1).  In  such  a  system,  terminals  communicate  with  the 
central  processing  unit  via  telephone  lines  and  a  multiplexor 
channel.  In  the  central  processing  unit,  two  or  more  programs 
are  operating  asynchronously  in  separate  partitions  of  high  speed 
memory  under  control  of  the  OS  supervisor. 

In  one  partition,  the  Message  Control  Program  plus  some 
additional  QTAM  code  dispatches  incoming  and  outgoing  messages. 
The  Message  Control  Program  makes  use  of  core  buffers  (the  number 
and  size  being  specified  by  the  programmer)  plus  message  queue 
storage  on  a  direct  access  storage  device. 

In  the  other  partitions  are  the  Message  Processing 
Proqrams.  These  programs  perform  all  the  ESDP  processing 
functions.  They  receive  messages  from  and  transmit  messages  to 
the  Message  Control  Program  via  GET  and  PUT  macro  commands.  When 
a  message  has  been  received,  an  ESDP  controller,  one  of  the 
Message  Processing  Programs,  must  first  determine  what  activity 
the  sender  is  involved  in.  For  instance,  it  must  recognize 
whether  a  message  is  a  response  to  a  question  in  interrogation 
or,  say,  a  query.  Once  this  determination  has  been  made,  some 
type  of  housekeeping,  depending  on  the  particular  message  and 
activity,  is  performed  to  initialize  the  ESDP  functional 
routines.  Program  control  is  then  switched  to  the  particular 
module  of  programming  required  to  perform  the  desired  activity. 
These  modules  interact  with  the  system  files  and  issue  messa jes 
back  to  the  terminals  via  PUT  commands. 

In  addition  to  providing  the  capabilities  outlined 
above,  the  operating  system  for  ESDP  must  be  concerned  with  the 
following  requirements:  (1)  More  than  one  user  terminal  may  be 
communicating  with  any  one  program  module  at  a  time.  This 
requirement  may  best  be  met  by  assuring  that  the  program  modules 
are  reentrant.  (Note  that  in  our  current  experimental  work  wo 
have  used  PL/T  which  produces  reentrant  code.)  (2)  Different 
user  terminals  may  be  communicating  with  different  prograin 
modules  at  the  same  time.  We  feel  that  this  requirement  can 
probably  be  met  by  a  multi-tasking  supervisor  such  as  that  now 
used  in  OS/360  with  Multiprogramming  with  a  Variable  Number  of 
Tasks  (MVT).  This  will  provide  for  a  primitive  form  of  time 
sharing  by  activating  tasks  whenever  an  I/O  ooeration  occurs, 
makinq  use  of  a  priority  system  for  the  tasks.  For  the  system 
described,  this  type  of  time  sharing  should  suffice,  since  there 
should  be  no  periods  of  long  processing,  uninterrupted  by  I/O 
commands.  (3)  More  than  one  user  terminal  may  be  accessing  any 
one  data  element  at  the  same  time.  This  will  require  that  some 
form  of  data  base  lockout  be  placed  into  the  system. 
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Figure  1.  General  ESDP  System  Concept 
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The  general  concept  of  ESDP  is  for  a  te le processed 
terminal-oriented  system.  The  terminals  themselves  should 
comprise  the  following: 

a.  A  cathode  ray  tube  display  with  keyboard  entry. 

>1ost  of  the  conversational  processes  will  be  performed 
through  this  device.  Light  pen  capability  and/or  vector  drawing 
capability  may  be  desired  depending  on  the  need  for  activities 
such  as  production  of  graphics  in  the  documentation.  For  the 
strictly  conversational  documentation  activities,  generation  of 
an  average-sized  character  set  (e.g.,  64  character  set  including 
numbers  and  upper  case  letters)  on  the  face  of  the  CRT  should 
suf  f ice. 

b.  Hard  copy  printer. 

It  is  often  desirable  to  retain  a  hard  copy  of  that 
has  been  displayed  on  the  CRT.  This  can  be  accomplished 
typewriter  type  printer  (without  keyboard) ,  the  printing 
activated  by  command  from  the  keyboard  associated  with  the 

c.  Line  printer 

Lino  printing  should  be  centralized  so  that  high  volume 
outputs  can  be  generated  in  the  machine  room  for  subsequent 
manual  transmittal  to  the  requesting  user. 

d.  Terminal  polling 

A  round-robin  polling  system  with  priorities  such  as 
that  used  by  OTAM  seems  appropriate.  Of  course,  if  inefficiency 
results,  perhaps  the  priority  scheme  should  be  revised  so  as  to 
be  based  on  the  particular  activity,  for  instance,  rather  than 
simply  the  terminal  identification. 

It  is  anticipated  that  during  hours  when  normal  ESDP 
documentation  activity  is  light,  other  programs  can  be  run  that 
are  not  under  the  general  QTAM-ESDP  set  up.  Examples  of  such 
programs  are: 

File  cleanup  progra ms — It  may  be  necessary  to  move  data 
on  the  direct  access  storage  devices  in  order  to  reuse  space 
freed  via  deletion  of  records.  This  reorganizing  is  one  tyne 
file  processing  that  might  be  performed  oft-line.  In  addition, 
there  are  normal  utility  functions  such  as  disk  copying,  disk 
printing,  etc.,  that  could  fit  in  this  category. 

£l§z£L22^sso rs — There  may  be  some  pre-processinq 
desired  for  the  CAINT  Executive  Language.  This  is  particularly 
true  when  debugging  macros' are  to  be  used.  Such  pre- processors 
could  operate  off-line. 


which 
via  a 
being 

CRT  . 
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2.  Ha  rdware  Assumptions.  Hardwar 
this  study,  except  indirectly,  when 
various  objectives  was  considered, 
basic  hardware  assumptions  underlying 


e  has  not  been  considered  i 
feasibility  of  attainin 
There  were,  however,  sotn 
the  study.  These  are: 


n 
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a.  Machine  Utilization 


A  computing  system  will  be  dedicated  to  ESDP. 
b.  Machine  Type 


A  System/360  computer  with  Operating  Systera/36 
used  for  the  experimental  programming  in  this  project, 
choice  of  hardware,  of  course,  is  not  mandatory.  However, 
of  the  discussion  in  this  report  is  based  on  S/360  with  0 
therefore,  uses  that  terminology. 


0 
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much 
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II 


DATA  BASE 


We  foresee  the  need  for  several  files,  or,  in  the 
terminoloqy  adopted  herein,  file  sets.  While  it  would  be 
possible  to  store  most  of  the  information  to  be  described  below 
in  one  monolithic  file,  this  breakdown  recognizes  differing 
frequencies  of  file  modification,  different  processes  to  be 
performed  on  data,  and  different  means  of  control  of  access. 

1.  Progra m  Description  File  Set.  This  file  set  contains  a 
logical  record  for  each  unit  of  programming.  Its  organization 
will  bear  a  close  resemblance  to  the  outline  of  a  conventional 
program  description,  but  there  is  no  permanent  standard  and  it  is 
expected  and  encouraged  that  the  content  and  composition  of  this 
file  will  be  shaped  by  the  users  to  fit  their  own  needs. 

The  major  subjects  to  be  covered,  in  a  generalized  form 
of  the  file  are: 

o  Identification — of  the  program,  programmer,  date, 

etc. 

o  Program  Structure--in  terras  both  of  the 

hierarchical  structure  of  the  program  anl  of  the 
branching,  or  control,  structure. 

o  Data  References — the  data  items  named  by  the 

program  and  the  nature  of  their  use. 

o  Logic  Description--both  symbolic  and  natural 

language  descriptions  of  what  the  proaram  do«s, 

how,  why. 

o  .Management  and  Status  Data--inf orraation  relative 

to  the  program  as  an  item  being  produced,  its 
schedule,  progress,  problems,  etc. 

o  Illustration  Ref erences--ref erences  to  flow  charts 
and  tables  to  be  composed  by  ESDP  and  to  be 
orinted  with  this  program  description  information. 
Also,  references  to  other  graphics,  used  for 
illustration,  which  are  not  able  to  be  store  1 
within  the  ESDP  computer. 

Except  for  the  identification  section,  for  which  no  amplification 
is  necessary,  these  items  are  discussed  below  in  greater  detail. 

a.  Program  Structure 

This  section  would  contain  pointers  to  related  UOP's. 
There  are  two  general  categories  of  relationship:  hierarchical 

and  control.  Hierarchical  pointers  would  indicate  subordination 
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or  superordination,  and  control  pointers  would  indicate  entry 
points  or  predecessor  and  successor  OOP's.  Entries  from,  or 
exits  to,  label  variables  would  be  treated  as  a  special  case, 
with  the  variable  representing  a  program  switch  which  might  be 
given  a  form  of  UOP  status.  Also  contained  in  this  section  would 
be  a  codification  of  the  type  of  branch  control  (whether 
unconditional,  such  as  a  PL/I  GO  TO;  or  conditional,  such  as  an 
IP  or  DO  and  the  variables  that  affect  the  branch.  In  addition, 
there  would  be  narrative  explanations  of  the  control  logic,  or 
pointers  to  such  explanations. 

b.  Data  References 

The  exact  extent  to  which  data  documentation  should  be 
split  or  duplicated  between  the  program  and  the  data  description 
files  depends  on  the  philosophy  of  management  of  the  object 
system.  At  a  minimum,  this  section  of  a  program  description  file 
must  list  the  data  elements  that  occur  in  the  program,  and  must 
give  the  nature  of  the  usage,  such  as  a  control  variable  (a 
variable  that  directly  affects  a  branching  decision)  ,  a  computed 
value  (  set  by  an  assign  or  DO  statement)  or  any  of  a  number  of 
other  categories  of  usage.  The  bulk  of  the  actual  description  of 
the  data,  as  differentiated  from  the  codification  of  the  nature 
of  its  use,  will  be  carried  in  the  Data  Description  File  Set. 

c.  Logic  Description 
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The  obvious  kind  of  information  to 
do  with  progress  in  meeting  schedules  and 
assigned,  problems  being  met  or  anticipated. 
Mostly,  the  information  will  be  collected  b 
some,  such  as  number  and  dates  of  compilation 
etc.,  can  he  aeguired  automatically. 
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e.  Illustration  References 


Wo  may  group  illustration  material  into  three  classes: 
flow  charts  or  tables  that  are  required  according  to  the  system 
documentation  plan,  flow  charts  or  tables  volunteered  by  the 
programmer  or  file  documentor  to  augment  some  asnect  of  his 
narrative,  and  other  illustrative  material  that  is  volunteered, 
but  is  not  in  flow  chart  or  table  form.  The  restriction  on  flow 
chart  and  table  form  arises,  of  course,  from  the  planned  use  of 
standard  graphic  programs  that  will  produce  these  configurations 
easily,  but  cannot  handle  the  full  range  of  graphic  input.  Thus, 
if  a  programmer  wishes  to  input  a  logarithmic  graph,  or  a  diaqram 
of  two  aircraft  on  a  collision  course,  he  will  probably  have  to 
draw  these  in  the  conventional  manner,  but  include  in  the 
machine-store!  documentation,  a  reference  to  the  illustration 
copy. 


Tn  the  program  description  file  set  will  be  stored  only 
pointers  to  the  detailed  illustration  information,  whether  or  not 
stored  within  ESDP.  We  recommend  this  separation  because  these 
files  will  be  large,  and,  while  they  may  be  updated  whenever  the 
program  is,  they  will  rarely  be  subject  to  information  retrieval 
searches  in  the  same  way  as  detailed  data  in  the  remainder  of  the 
file.  A  us°r  may  want  to  retrieve  the  information  that  is 
displayed  on  a  chart,  but  he  will  not  normally  want  to  retrieve 
the  detailed  FLOWCHART  instructions  that  organize  the  display. 

2.  Data  Description  File  Set.  The  recommended  approach  for 
documentation  of  data  files  is  to  create  a  data  description 
record  for  each  file  or  structure  used  in  any  program,  with  as 
many  workinq  records  as  needed  to  give  each  user  a  chance  to 
document  the  data  the  way  he  prefers.  Another  version  of  the 
descriptive  record  will  be  created  and  maintained  by  an 
authorized  COMPOOL,  or  data  base,  controller  in  whom  will  be 
vested  authority  to  make  final  decisions  on  data  definitions  and 
attributes,  and  who  can  delete  working  records  at  will.  His 
expected  mode  of  operation,  then,  should  be  to  review  his  data 
items  periodically,  look  at  the  conflicting  descriptions  or 
requests  of  the  individual  programmers,  make  his  decision  on 
which  version  to  accept  or  declare,  and  nut  the  final  decisions 
into  the  permanent  record. 

Provision  can  be  made,  using  the  dissemination 
services,  to  promulgate  the  data  base  controller's  decisions 
immediately  to  all  programmers  concerned. 

The  organization  of  the  data  description  record  will  be 
similar  to  that  for  a  program.  It  will  have  the  following  major 
headings: 

o  Identification 

o  Element  description — the  narrative  and 

other  information  about  the  item  and  its 
use. 
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o  Structure — primarily,  this  is  used  for 

higher  level  structures  in  order  to 

define  the  subordinate  structure. 

o  Using  program  references — pointers, 
possibly  some  supporting  text  on  what 
effect  the  particular  using  program  has 
on  the  data. 

o  Illustration  references 

3.  ££23£3.!!1  File  Set.  This  file  set  will  contain  the  text  of  the 
programs  being  documented.  In  addition,  consideration  will  be 
given  to  storing,  within  this  file  set,  program  change 
information,  separate  from  the  program,  itself.  This  will  permit 
a  record  to  be  kept  of  all  changes,  and  this  will  permit 
programmers  to  make  changes  either  to  the  latest  version  of  a 
program,  any  previous  version,  or  both  simultaneously.  The 
complete  text  of  any  version  of  the  program  could  be  retrieved  on 
request.  Another  possible  class  of  information  for  this  file  set 
is  partially  reduced  program  analysis  data.  This  would  be 
intermediate  output,  produced  during  a  program  analysis  run, 
which  could  be  saved  to  reduce  the  time  required  to  process  a 
change  to  the  program. 

4.  Graphic  Coding  File  Set.  We  recommend  the  use  of  a  program 
for  the  automatic  production  of  flow  charts  and  tables.  In  some 
systems,  such  as  FLOWCHART  [1],  graphics  are  assembled  by  the 
issuance  of  commands  on  how  to  build  them,  in  a  manner  similar  to 
computer  programming.  The  Graphic  Coding  File  Set  would  contain 
these  instructions. 

The  graphic  files  may  be  updated  separately  from  the 
proqram  or  data  files  they  illustrate,  but  this  form  of  updating 
should  probably  be  restricted  to  changes  in  layout.  Chanqes  in 
content  or  structure  should  be  keyed  to  changes  in  the  data  or 
programs  being  described,  although  the  initiative  for  a  change 
may  originate  with  either  a  program  or  data  description  change  or 
with  a  graphic  change. 

5.  Publication  File  Set.  This  file  set  will  contain  partially 
processed  documentation,  taken  from  any  of  the  other  files.  The 
preparation  of  copy  for  publication  can  be  a  time-consuming 
process.  Hence,  partially  edited  material  should  be  retained  in 
machine  readable  form  for  reprinting  or  for  selection  for 
inclusion  in  differently-organized  documents.  This  form  of 
storage  is  used  by  the  IBM  Administrative  Terminal  System  [2  1,  a 
text  processing  system,  a  successor  of  which  is  recommended  for 
use  in  ESDP. 
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6.  Instruction  Course  File  Set.  Instruct 
throuqh  ES DP  olays  a  dual  role.  It  is  use 
system  program,  but  it  is  also  a  form 
changes  in  it  must  be  keyed  to  changes 
being  taught.  Hence,  instruction  cours 
programs,  but  must  have  appropriate  poi 
the  documentation  from  which  they  were  der 


ional  material  produced 
d  in  its  own  riaht  as  a 
of  documentation  and 
in  the  programs  or  data 
es  can  be  stored  as 
nf.ers  back  and  forth  to 
ived . 


7.  Dissemination  File  Set.  These  files  will  contain  the  profile 
and  distribution  lists  needed  to  operate  the  internal  ESDP 
dissemination  system  on  documentation  and  changes  thereto. 


8.  Index  File  Set.  These  files  are  those  indexes  and  inverted 
indexes  used  by  the  information  retrieval  system  to  carry  out  its 
functions.  These  are  also  dynamic  files,  which  are  subject  to 
frequent  change  as  the  documentation  files  change. 


9.  Suffer  File  Set. 
and  are  for  use  by  the 


Buffer  files  are  dynamically  created 
information  retrieval  system. 


files 
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Ill 


PROGRAM  ANALYSIS 


1.  General . 
by  IBM  [3] 
activities. 
Control  or 
canonical  or 


A  Program  Analysis  (PA)  program  has  been  produced 
as  part  of  its  own  internally  sponsored  ESDP 
This  program  accepts  input  in  PL/I,  OS/360  Job 
OS/360  Linkage  Editor  languages  and  compiles  a 
structural  data  file  descriptive  of  the  hierarchical 
and  control  structure  of  the  programs  and  their  usaaes  of  data. 

Source  Code  Analysis  is  performed  by  a  set  of  compiler¬ 
like  analyzers  which  are  oriented  to  a  particular  language.  The 
number  of  analyzers  is  dependent  on  the  make-up  of  the  user's 
programming  system.  The  role  of  each  analyzer  is  identical 
regardless  of  the  language,  namely  to  map  source  code  into  the 
UOP  coordinate  structure  and  generate  the  data  records  associated 
with  it.  In  this  way,  each  analyzer,  which  is  necessarily 
language  dependent,  can  effect  a  common  interface  with  the 
system. 


Control 

OS/360 


Three  analyzers 
Language  (JCL)  , 
PL/I  Language. 


have  been  written,  for  OS/360  JOB 
OS/360  Linkage  Editor  Language,  and 
This  sample  was  selected  to  permit 
experimentation  with  programming  systems  written  primarily  in 

:e  again 
is  pro¬ 
point  since  it  is  key  to  the 
automatic  analysis  of  system-wide  interactions. 


experimentation  witn  programming  systems  written  primal 
PL/I,  of  which  the  analyzer,  itself,  is  an  example.  He  not< 
that  the  treatment  of  run-time  languages  (e.g.,  JCL)  a: 
gramming  languages  is  a  critical  point  since  it  is  key 


Current  compilers  and  assemblers  now  generate  source 
code  listings  and  cross  reference  lists  for  data  variables  for  a 
single  program  at  compilation  time.  However,  this  is  ordinarily 
the  extent  of  their  automatic  capabilities.  Additional 
programming  information  on  program  interactions  within  a  larger 
system,  rationale  behind  program  logic  and  program  groupings, 
data  flow  through  the  system,  and  so  on,  are  necessarily  based  on 
interrogation-acquired  documentation. 

The  analyzer  parses  the  source  program  into  elements 
called  Units  of  Programming  (UOP).  The  current  program  produces 
UOP's  at  the  following  levels: 

JOB 


LOAD  MODULE 

SOURCE  MODULE  (Compilation  Unit) 
CALL  MODULE  -  Procedure  Block 
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ROUP  - 

BEGIN  block 

DO 

group 

IF 

compound 

statement 

EGMENT 

ON 

compound 

statement 

For  each  UOP  in  a  structure,  a 
which  contains  the  appropriate  struct 
information.  This  structure  and  logic  d 
records  is  also  the  mechanism  for  creati 
a  program  system  or  any  of  its  major  com 


data  record  is  created 
ure,  logic  and  data  usage 
ata  of  the  individual  UOP 
ng  the  total  structure  of 
ponents. 


To  make  the  programmer  aware  of  how 
structured,  a  revised  program  listing  is 
graphically  depicts  the  coordinate  structure  a 
this  program.  This  revised  listing  is  usef 
guide  to  the  files,  but  also  as  a  picture  of  p 
which  may  easily  become  obscure  in  the  co 
listing,  particularly  with  free  format  languages 
statements  can  be  strung  together  in  a  single  pr 
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ine. 

System-wide  interactions  of  a  program  can  be  obtained 
through  the  automatic  analysis  of  the  Object  Module  generated  by 
the  compiler  and  the  JCL  deck  that  would  be  written  for 
execution. 


0 

program  in 
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Thij  information  is  critical  when  sots  of  programs  are 
linked  together  and  manipulate  the  same  data,  since  this  is  the 
source  of  most  problems  and  delays  during  integration  of 
programming  systems  where  the  various  pieces  were  written  and 
debugged  by  different  people. 

Within  OS/360,  the  execution  of  a  program  would  require 
a  JOB  Control  deck  or  program.  The  analysis  would  equate,  within 
the  UOP  records,  the  file  declarations  at  the  JOB  level  with  all 
references  to  these  files  down  to  the  SEGMENT  level.  In  a  more 
complex  case,  where  condition  codes  and  multiple  job  steps  were 
defined,  this  same  correlation  of  program  units  and  data  usag^ 
would  have  adled  signif icance. 

2.  Operation  of  Program  Analysis.  The  analysis  of  PL/I  code  is 
performed  in  several  phases.  In  Phase  1,  PL/I  source  code  is 
read  in,  and  blanks,  comments,  and  constants  are  eliminated.  The 
remaining  characters  are  translated  through  use  of  a  translation 
table.  The  general  effect  of  this  translation  is  to  replace  the 
source  language  string  with  numeric  codes  in  such  a  way  that 
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alphanumeric  strings  are  grouped 
special  characters  in  the  lower 
in  the  middle.  This  is  done  so 
limit  testing  can  be  used  on  the 


in  the  higher  number  codes, 
number  codes,  and  operation  codes 
that  in  future  processing,  simple 
codes  to  determine  the  type. 


An  output 
is  given  a  statement 


string 

number. 


is  organized  in 


which  each 


sta  tement 


The  statements  are  scanned 
names,  parameters,  or  condition 
dictionary  entry  is  created,  and  a 
entry  is  stored  with  the  statement. 


and  whenever  labels,  file 
names  are  encountered,  a 
pointer  to  the  dictionary 


When  OOP  defining  statements  (e.g.,  DO, 
encountered  in  the  scan,  entries  are  made  in  a  parsi 
Then,  when  statements  are  encountered  that  end  UOP's  (e 
the  table  is  searched  to  determine  which  entry  is  cl 
table  then  contains  the  statement  numbers  defining  the 
the  OOP's. 


BEGIN)  are 
ng  table. 
•  g.  ,  END)  , 
osed.  The 
limits  of 


At  the  completion  of  Phase  1,  the  parsing  table  has 
been  filled,  and  the  dictionary  has  been  partially  filled. 

Phase  2  reformats  the  source  text,  indicating  the 
parsed  units  and  statement  numbers.  The  units  are  indicated  in 
such  a  way  as  to  ease  reading. 

In  Phase  3,  DECLARE  statements  are  analyzed.  This  is 
done  using  an  array  of  attribute  masks.  Each  data  attribute  is 
represented  by  a  32-bit  mask  (row).  Each  element,  A(i,j) 
represents  the  interaction  of  attributes  i  and  j.  If  A(i,j)  is 
one,  then  the  two  attributes  can  co-occur.  Zero  means  that  they 
cannot  co-occur.  For  instance,  EXTERNAL  can  co-occur  with  FIXED 
but  not  with  INTERNAL.  Each  attribute  is  looked  up  in  the  table 
of  masks  and  all  of  the  masks  are  AND'ed  together.  The  result,  is 
a  32-bit  string  with  ones  representing  the  attributes  of  the 
DECLARE'd  data.  Note  that  by  starting  with  the  assumption  that 
all  attributes  apply  and  then  ruling  out  impossibilities, 
defaulted  attributes  are  also  depicted.  Scope  tables  are 
generated  for  the  data  and  these  plus  the  attribute  masks  are 
added  to  the  iictionary,  which  is  now  completed. 

At  this  point  in  the  Program  Analysis  process,  two 
internal  tables  have  been  built--the  dictionary  and  the  parsing 
table.  Their  formats  are  described  below. 


a.  Dictionary 
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(3)  Bytes  7-8  contain  for  structures,  pointers  to 

structure  elements,  and  for  labels,  the  statement  number  of  the 

label  declaration. 

(4)  Bytes  9-28  contain  the  identifier  as  it 
appears  in  the  source  code. 

(5)  Bytes  29-32  contain  a  bit  tahle  that  defines 
the  unique  attributes  or  characteristics  of  the  entry. 

(6)  Bytes  33-40  are  a  set  of  offset  values  that 

point  to  the  overflow  area.  Note  that  certain  PL/I  attributes 

carry  value  information,  e. g. ,  precision,  bounds  of  dimensions, 

file  environments,  etc.  This  value  data  is  stored  in  the 
overflow  area  and  the  offsets  are  used  to  delimit  the  start  and 
stop  of  various  values. 

(7)  Bytes  40-119  contain  any  values  associated 
with  attributes. 


F  ig 

ure  2 

illustrates  a 

typica 

1 

dictionary  eni 

try. 

b. 

Par  si 

nq  Table 

To 

delimit  UOP  structures 

a 

nd  keep  track  of 

rJ0P 

nest 

i 

nq. 

this 

table 

is  generated 

during 

the  analysis. 

T  t  also 

i  s 

delca 

red 

as  a 

n  arra 

y  of  bit  stri 

ngs  where 

each  element 

represen  ts 

a  s  in 

qle 

UOP. 

The 

format  of  each  eleme 

at 

is  as  follow: 

5 : 

(1) 

Bit  1  -  s 

tatus  s 

vi 

t.ch  used  to  determine 

if 

UOP 

has  b 

een 

closed 

- 

(2) 

Bits  2-9  -  U 

OP  leve 

1 

code  where 

code  r 

uns 

from 

1  - 

6 ,  corresponding  to  JOB 

level 

to 

Segment  leve! 

L. 

(3) 

Bits  10-73 

conta 

in 

the  procedi 

ure  name 

or 

label 

on 

the 

includ 

ing  procedure 

m 

(4) 

Bits  74-88 

contain 

t 

he  statement 

number 

of 

the 

f 

irst 

sta 

tement 

in  the  UOP. 

(5) 

Bits  86-97 

conta 

in 

the  stateien1 

t  number 

of 

the 

1 

ast 

stat 

ement 

in  the  UOP. 

(6) 

Bits  98-106 

conta in 

a 

dictionary  ; 

pointer 

to 

the 

1 

abel 

associate 

d  with  the  UOP,  if  a 

ny 

• 

Fig 

ure  3 

illustrates  a 

typica 

1 

parsing  table 

entry. 

13 


Hash  Scope  Structure  Data  Name  Attri-  Table  Over- 
Chain  Pointers  butes  of  flow 


Byte  1  3 

Offsets  Area 

79  29  33  40 

Figure  2.  A  Typical  Dictionary  Entry. 

Switch 

Including  First  Last 

Level  Procedure  Statement  Statement  Dictionary 

Number  Name  Number  Number  Pointer 

Byte  1 

2  10  74  86  98 

Figure  3.  A  Typical  Parsing  Table  Entry. 
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The  string  generated  in  Phase  1  is  read  in  Phase  4,  and 
a  new  string  is  produced  that  is  completely  coded.  \11 
identifiers  are  replaced  with  dictionary  pointers. 

Phase  5  determines  data  type  for  all  data  in  the 
dictionary  and  adds  a  data  type  code  to  the  dictionary. 

Phase  6  reads  in  the  parsing  table  and  reads  in  the 
program  statements,  one  at  a  time.  From  these  it  generates  the 
IJ OP  records  and  with  the  additional  input  of  the  dictionary  it 
generates  the  trailer  records.  These  are  written  out  on  tape. 

The  JCL  cards  are  used  by  the  JCLSCAN  Program,  and  each 
card  is  examined  to  determine  if  it  is  a  JOB  card,  an  EXEC  card, 
a  DD  card,  or  other.  Cards  in  the  other  category  are  immediately 
rejected.  JOB  cards  are  further  examined  for  condition  codes  at 
the  JOB  level.  If  they  exist,  they  are  stored  on  an  analysis 
list. 

For  EXEC  cards,  the  program  stores  the  job  step  name  in 
the  analysis  list  and  then  determines  if  the  name  refers  to  a 
catalogued  procedure.  If  it  does,  the  name  is  marked  as  job 
level.  If  it  does  not,  the  name  is  marked  as  load  module  level. 
The  EXEC  card  is  then  checked  for  JOB  stop  parameters,  and  if 
there  are  any,  they  are  stored  in  the  analysis  list.  The  sine 
process  is  followed  for  JOB  STEP  condition  codes. 

For  DD  cards,  the  DSNAME  is  stored  in  the  analysis  list 
along  with  any  disposition  parameters. 

After  all  of  the  JCL  cards  have  been  read,  the  analysis 
list  is  further  processed,  the  process  varying  with  the  type  of 
JCL  statement. 

JOB  -  The  HOP  name  is  extracted  from  the  job  statement 
label  field.  The  entry  and  exit,  portions  of  the  UOP  are  marked, 
and  if  condition  codes  exist,  subordinate  UOP  are  marked  as  exit 
points. 

DD  -  A  data  reference  entry  is  made  in  the  UOP  for  the 

DD. 


EXEC  -  JOB  STEPS  become  subordinate  units  to  the  70B 
UOP.  The  UOP  names  are  the  JOB  STEP  names.  If  J03  STEP 

parameters  exist,  a  data  reference  entry  is  made  using  a  dummy 
name.  If  JOB  STEP  condition  codes  exist,  the  subordinate 

transfer  table  is  marked  accordingly. 


The  Linkage  Editor  Analysis  Program  (LEAP)  begins  by 
reading  from  the  primary  input  stream.  A  test  is  made  to 
determine  if  the  first  entry  in  the  stream  is  a  linkaae  editor 
command.  If  it  is  not,  the  entry  is  processed  as  an  object 
module.  If  it  is,  another  test  is  made  to  determine  if  the 
command  is  an  INCLUDE  statement.  INCLUDE  statements  effect 
readings  from  secondary  input  streams.  All  other  command  types 


15 


are  igno 
stream  un 
secondary 
the  card 
9-2  punch 
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red.  Object  module  processino  continues  in  the  primary 
til  an  INCLUDE  is  found.  Then,  processing  shifts  to  the 
stream.  In  the  secondary  stream,  the  first  column  of 
image  is  checked  for  a  blank  (indicating  command) ,  a  12- 
(iniicating  an  object  module),  or  any  other  (indicating 
module).  The  first  two  are  processed  as  previously 

and  the  third  (load  module)  causes  a  load  module 
te  unit  entry  to  be  established . 


Once  all  of  the  linkage  editor  object  modules,  load 
modules  and  commands  have  been  processed,  a  UOP  record  is  formed. 
This  UOP  is  in  the  same  format  as  a  PL/I  UOP. 


3.  Additional  Requirements.  There  must  be  added  to  the  program 
analysis  implementation  an  incremental  analysis  capability.  When 
a  programmer  makes  a  change  to  an  existing  program,  he  should  not 
have  to  run  the  entire  program  through  analysis  again.  This 
process  now  takes  an  amount  of  time  on  the  same  order  as  a  full 
compilation,  hence  in  a  large  system  it  could  become  a 
significant  drain  on  computer  capacity  if  repeated  often. 
Instead,  the  approach  recommended  is  to  have  ESDP  store  the 
latest  copy,  and  let  the  programmer  make  changes  by  use  of  ADD, 
CHANGE,  and  DELETE  commands,  treating  his  stored  proqram  as  a 
file.  In  this  way,  PA  need  only  analyze  the  changes  and  make 
minimal  modification  to  the  canonical  data  file,  and  new 
interrogations  can  be  initiated  only  on  those  portions  of  the 
program  affected. 
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4, 

which  al 

ore  detailed  information  than  that  produced  by  the 
program  is  needed  for  classifying  the  manner  in  which 
are  used  or  referred  to  within  the  program  text.  For 
of  reasons  (e.g.,  making  up  more  detailed 
ans,  assisting  in  test  planning,  assisting  in 

and  providing  better  cross  indexing  of  documentation) , 
t  data  usage  should  be  classified  in  as  much  detail  as 
Furthermore,  the  information  desired  is  available 
program  analysis  function,  but  currently  is  discarded 
n  saved  (this  is  also  true  of  compilation) .  A 
1  classification  code  should  be  used  for  each 

of  a  data  label.  This  code  should  reflect  whether  the 
nged  by  this  usage  or  not;  whether  it  is  changed  by 
gned  a  new  value  or  having  a  new  value  read  in: 
is  used  without  being  changed;  whether  when  used  in  an 


statement,  it  is  used  as  an  it°m  in 
another  item,  a  control  item, 
on  to  such  a  classification  system  is 
so  appeared  in  Volume  1. 
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1 


Context  of  Appearance 


1.1  Assignment  Statement 

1.1.1  Computed  Value 

1.1.2  Argument 

1.2  Control  Statement 

1.2.1  Variable  I/O  Command 

1.2.2  Branching  or  Transfer  Command 

1.2. 2.1  Argument  or  condition  statement  (IF, 

ON _ ) 

1.2. 2. 2  Iterative  Control  Variable  (DO) 

1.2.  2.  2.1  Initial  index  value 

1.2. 2. 2. 2  Increment 

1.2. 2. 2. 3  Maximum  value  or  limit 

1.2. 2. 3  Variable  address 

1.3  Subrout ine/Function/Macro  Calling  Sequence 

1.3.1  Transmitted  to  SP/Function/Macro 

1.3.2  Received  from  S R/Funct ion/Macro 

1.4  Data  Declaration  Statement  (or  other  non-executable 
sta  temen  t) 

1.5  Input/Output 

1.5.1  Input 

1.5. 1.1  Tnput  Control  Variable 

1.5. 1.2  Data  Element  real  in 

1.5.2  Output 

1. 5.2.1  Output  Control  Variable 

1.5. 2.2  Data  Element  written  out  or  transmitted 

2.  Change  Status 

2.1  Value  Changed  by  Containing  Statement 

2.1.1  Value  Directly  Assigned  bv  Assignment  Statement 

2.1.2  Value  Directly  Changed  by  DO  Statement 

2.1.3  Value  Directly  Changed  by  Variable  I/O  Statement. 

2.2  Value  not  Changed  by  Containing  Statement 

3.  Structural  Pole 

3.1  Data  Element  is  a  Structure  or  Array 

3.2  Index  or  Subscript 

3.2.1  VALUE  OF  AN  Index 

3.2.2  Element  of  an  Index  Term 

3. 3  Scalar  Item 


Figure  4.  Classification  of  Data  Usage  by  a  Program. 


17 


Another  aspect  of  program  analysis  (or  possibly  of 
information  retrieval)  to  be  borne  in  mind  is  that,  as  the 
documentation  files  grow  large,  there  will  be  inevitable  errors, 
such  as  programmers  misnaming  programs,  submitting  the  wrong 
version  of  a  program  for  analysis,  entering  changes  incorrectly 
(resulting  in  an  actual  program  that  differs  from  what  the  author 
thinks  it  is),  etc.  These  are  normal  mistakes  of  any  programming 
project  and,  in  a  purely  manual  system  they  can  be  tolerated  and 
relatively  easily  reversed.  The  documentation  file  system  and 
the  program  analysis  system  must  be  so  designed  as  to  anticipate 
such  errors  and,  while  it  is  not  ESDP's  responsibility  to  detect 
them,  it  should  be  possible  within  ESDP  to  correct  them  with 
minimum  difficulty. 
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CONVERSATIONAL  PROCESSING 


A  thorough  description  of  the  converse 
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V 


INFORMATION  RETRIEVAL 


1.  ESDP  The  files  handled  in  the  ESDP  system  include  the 

f ollowinq: 

a.  Program  Description  File  Set 

This  file  set  contains  a  record  for  each  UOP  in  th 
object  system.  The  information  in  the  tile  may  be  derive 
through  program  analysis,  interrogation,  or  both  with  the  source 
being  identified. 


b.  Data  Description  File  Set 

This  file  contains  a  record  for  each  Unit  of  Data 
(UOD)  .  Here,  the  information  is  obtained  through  interrogation 
only.  UOP  and  UOD  are  linked  via  pointers  since  the  data  are 
referenced  in  UOP. 


c.  Index  File  Set 


A 

file  is 

build 
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or 

UOD  and 
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utomat ica 
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with  the 
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ion 

are 

res 

ponse. 
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.  Puffer 

File 

S^t 

words  indicating  for  each  the  UOP 
he  key  word.  The  key  words  are 
ime  that  a  response  to  a  question 
n.  At  that  time,  the  UOP  name  of 
the  interrogation  and  the  IFN 
ao pended  to  all  key  words  in  the 


Provision  is  made  in  the  ESDP  concept  for  general  file 
handling  capabilities.  Programs  that  interpret  file  format 
tab les  for  file  accessing  will  be  included.  In  addition,  file 
building  may  be  done  on  line  as  well  as  off  line.  The  intended 
use  of  the  special  files  is  as  personalized  subsets  of  the  ESDP 
data  base.  It  is  anticipated  that  this  feature  would  be  heavily 
used  by  system  managers  to  create,  update,  and  search 
personalized  management  information  systems. 


2.  £iie  Building  and  Maintenance.  Creation 
UOP  records  and  UOD  records  are  planned 
changes  in  the  object  system  of  programs, 
whenever  the  system  becomes  aware  of  a 
information  may  be  acquired  in  any  one  of  a 


and  modification  of 
in  ESDP  to  match  the 
Records  are  created 
new  UOP  or  UOD.  This 
number  of  ways: 


a.  Source  Code  Parsing 


Program  Analysis  creates  UOP's  by  parsing  source 
language  code.  UOP's  created  are  named  either  by  program  label 
or  by  a  combination  of  containing  UOP  name  and  statement  numbers. 
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b.  Source  Code  References 

References  may  be  made  in  the  source  code  to  UOD  or 
other  UOP  not  subject  to  Program  Analysis.  The  appearance  of 
these  references  in  source  coding  will  cause  the  creation  of  the 
appropriate  records  and  names  will  be  taken  from  the  source  cole. 

c.  Interrogation 

OOP  or  UOD  may  also  be  named  by  the  programmer  at  a 
console  in  the  interrogation  process.  This  can  occur  during 
design  interrogation  or  during  interrogations  performed  after  th^ 
object  system  program  has  been  subjected  to  program  analysis. 
Naming  of  these  UOP's  and  IJOD's  is  a  simple  process  since  the 
programmer  assigns  these. 

Whenever  changes  to  source  code  are  submitted  for 
analysis  or  incremental  interrogations  are  processed,  changes  to 
data  items  in  existinq  UOP  or  UOD  records  are  likely  to  tak° 
place.  The  changes  can  take  the  form  of  ADD,  DELETE,  or  REPLACE 
(DELETE  and  ADD).  The  way  in  which  the  system  handles  the  file 
updatinq  will  depend  on  the  data  elements  to  be  changed  an  1  the 
manner  in  which  the  requested  changes  are  entered  into  the 
system. 


ADD'ed  data  items  derive!  from  interrogation  may  be 
handled  directly  since  a  full  record  is  created  for  each  U0D  or 
UOD  whether  or  not  all  of  the  data  item  fields  contain 
information.  Therefore,  an  ADD  amounts  to  storing  the  new 
information  into  an  appropriate  position  in  the  core-resi  len t 
image  of  the  UOP/UOD  record,  and  rewriting  the  record  to  the  disk 
file.  Cross  references  are  added  in  the  normal  manner.  DELETE’S 
and  REPLACE'S  present  a  more  difficult  updating  problem  however. 
Again,  data  may  be  deleted  in  the  same  manner  as  it  was  adled 
above.  In  this  case,  however,  cross  references  must  also  be 
updated.  For  instance,  assume  that  a  programmer  wishes  to  delete 
textual  information  associated  with  a  given  IEN.  The  text  may  be 
deleted,  but  keyword  references  to  the  text  must  also  be.  This 
will  be  done  by  performing  a  second  keyword  extraction  on  the 
text  to  be  deleted.  The  keywords  extracted  will  then  be  used  as 
search  arguments  for  the  keyword  index  records  so  that  the 
appropriate  IEN  pointers  may  be  deleted. 

The  qeneral  concept  for  ESDP  file  updating  as  a  result 
of  changes  to  source  programs  is  to  rerun  Program  Analysis  on  the 
UOP  containing  the  changed  UOP.  Old  UOP  records  will  not  be 
erased  at  this  time.  Through  the  reconciliation  process, 
information  associated  with  the  old  UOP  records  will  be  linked  to 
the  new  UOP  records.  When  the  reconciliation  has  bnpn  completed, 
the  old  records  will  be  deleted.  Keyword  references  must  be 
updated  during  the  reconciliation  process.  If  text  from  an  TEN 
of  the  old  version  of  a  record  is  to  be  moved  to  another  TEN  of 
the  new  version,  the  keyword  updating  amounts  to  changing  the  IEN 
pointers  in  all  of  the  appropriate  keyword  records.  If  new  text, 
is  typed  in,  keywords  are  extracted  in  the  normal  fashion.  For 
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all  deleted  records,  the  keyword  deletion  is  performed  as  in  the 
case  of  deleted  text  through  incremental  interrogation. 


3.  Keyword  File.  An  experimental  keyword  extraction  program  is 
now  being  tested.  This  program  operates  as  follows: 


key  word 


(1) 


extraction 


Responses  to  questions  are  subjected 
under  the  control  of  the  CEL  Program. 


to 


(2)  Responses  are  edited  to  eliminate  deleted 
lines,  to  eliminate  deleted  characters  (backspace  and  retype),  to 
eliminate  carriage  returns,  and  to  convert  all  letters  to  upper 
case.  This  is  done  to  eliminate  mismatches  in  the  keyword  list. 
For  instance,  "Computer"  without  such  editing  would  not  match 
with  "computer"  and  similarly  carriage  return  characters, 
backspace  characters,  or  delete  characters  will  eliminate  any 
possibility  of  an  exact  match. 


(3)  Each  word  in  the  response  is  compared  with 
words  in  a  common  word  list.  Common  words  are  not  stored  as 
keywords. 


(4)  Each  keyword  (i.e.,  not  common  word)  is 
stored  and  is  tagged  with  the  IEN  associated  with  the  question  to 
which  this  is  a  response.  If  the  keyword  is  already  recorded, 
the  IEN  is  added  to  a  list  of  IEN's  in  which  the  word  appeared. 
In  addition  to  the  IEN,  the  keyword  could  be  tagged  with  it 
position  within  the  response.  This  would  enable  subsequen 
retrieval  based  on  position  of  keywords  in  a  response. 

4.  Searching.  Information  in  ESDP  is  indexed  in  four  ways: 

a.  Program  Element 

One  index  to  a  piece  of  data  is  the  particular  element 
(UOP,  UOD,  etc.)  with  which  it  is  associated.  This  information 
is  obtained  through  interrogation  for  design  documentation  and 
through  program  analysis  for  final  documentation. 

b.  Keywords 

Another  index  is  the  keyword  index.  The  keywords  are 
extracted  automatically  from  responses  to  interrogation 
questions. 

c.  Data  Names  and  Labels 

These  are  character  strings  used  in  the  orogram  or  the 
program  design  being  documented.  They,  too,  serve  as  indexes  to 
the  UOP  or  UOD  records. 

d.  Hierarchic  Code 

ESDP  employs  a  hierarchical  coding  system  and  attaches 
a  code  number  to  each  element  of  data.  This  number  is  called  an 
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Information  Element  Number  (IEN) .  The  structure  of  these 
numerical  codes  is  intended  to  classify  any  data  collected  by 
ESDP  about  an  object  system  of  programs. 

Searching  of  the  ESDP  files  is  requested  from  terminals 
or  ESDP  programs.  The  query  languaqe  is  basically  the  same 
subset  of  PL/I  as  is  used  for  executive  programming.  Again,  the 
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Cyclic  retrieval  is  defined  as  the  use  of  information 
retrieved  from  one  query  as  part  of  the  statement  of  a  subsequent 
query  to  the  same  or  a  different  file,  so  that  a  cycle  of  query, 
retrieval,  query  based  on  retrieval  data,  retrieval,  etc.,  can  be 
set  up. 
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writing  out  the  query  in  a  more  natural  form  (not  natural 
language,  but  a  programming-like  language)  or  building  his  query 
gradually  through  a  computer  assisted,  conversational  process. 


In  regard 
yet  been  developed. 


to  performance,  while  specifications  have  not 
it  seems  that  the  following  are  required: 


o  Records  must  be  retrievable  on  the 
obvious  characteristics  which  are 
usually  unique  identifiers:  address, 

sequence  number  within  a  file,  value  of 
a  key  or  sort  field. 


o  Records  must  also  be  retrievable  on  the 
basis  of  Boolean  combinations  of  these 
or  other  record  attributes,  each 
attribute  (probably)  being  able  to  be 
stated  as  one  or  more  relationship 
statements,  as  SALARY  =  10000  or  AGE 

<4  0. 


o 


o 


Individual  items,  fields,  arrays,  sub¬ 
records,  etc.,  can  be  specified  as  the 
information  to  be  retrieved  from  a 
record--the  entire  record  need  not  be 
retrieved  in  response  to  a  query.  Thus, 
the  burden  of  extracting  the  exact 
information  needed  from  a  record  is 
placed  upon  the  retrieval  system,  not 
the  calling  program. 


Information  called  for  may  be  ordered  to 
be  held  in  a  buffer  or  temporary  storage 
area  for  later  reference.  In  particular, 
this  requirement  is  imposed  to  make 
cyclic  retrieval  possible. 


o  The  requestor,  whether  a  person  or  a 
program,  may  specify  the  recipient  of 
the  information,  which  need  not  be  the 
requestor.  In  other  words,  an  IR  system 
user  may  call  for  the  retrieval  of 
information  and  its  presentation  to  some 
other  person,  output  device,  or  program. 
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It  should  be  noted  that  the  requirements  for  a  system 
responsive  to  both  human  requestors  and  programs,  with  one  of  the 
usinq  programs  being  a  query  acquisition  program  that 
communicates  with  the  human  user. 


A  feature  that  will  be  required  of  ESDP  will  be  to 
index  narrative  interrogation  responses  to  permit  access  by  the 
retrieval  system  on  the  basis  of  response  content.  There  are 
several  reasonably  well  established  techniques  for  doing  this. 
One  is  to  use  a  list  of  "common”  words  (articles,  the  forms  of 
the  verb  to  b£*  etc.),  delete  these  from  responses,  truncate  the 
remaining  worls  at  five  or  six  letters  and  use  them  as  a  keyword 
index.  Alternatively,  a  dictionary  of  system  terms  can  be  built 
and  this  used  to  identify  words  in  a  response  that  ought  to  be  in 
the  index  to  the  response.  This  list  must  be  constantly  modified 
to  be  sure  it  is  up  to  date.  Another  automatic  technique  that 
might  be  useful  is  to  require  that  a  special  character  precede  or 
follow  a  data  element,  program  name,  or  other  system  label  when 
used  in  text.  In  this  way,  any  cross  reference  in  a  response  can 
be  readily  identified. 

More  generally,  the  logic  of  computer  assisted 
interrogation  gives  us  the  following  information  about  a 
narrative  documentation  item,  before  it  has  been  elicited  from 
the  programmer: 

o  Subject  of  the  question--na me  of  UOP  or 
data  element,  particularly  aspect  being 
questioned. 

o  Structural  relationship,  for  cross- 
referencing  purposes,  with  other  pro¬ 
grams  or  files. 

These  items,  combined  with  keywords  extracted  from  the  response, 
give  the  potential  of  a  very  rich  keyword  index  for  use  in 
querying  or  in  automatic  dissemination.  The  same  items  can  be 
used  to  form  an  index  in  each  published  report.  These  indexes 
would,  of  course,  be  automatically  modified  if  the  basic 
documentation  were  modified,  either  through  interrogation  or 
program  analysis. 


We  anticipate  that  some  number  of  standard  queries  will  be 
previously  written  and  invoked  by  the  user  as  he  needs  them. 


Some  of  these  queries  may  be  complete  as 
completion  or  assignment  of  values 
interrogation. 


;tored  and  some 
to  parameters 


may  need 
through 


25 


This  type  of  standard  query  should  be  quite  easy  to 
implement.  The  majority  of  queries,  however,  will  be 
unanticipated.  These  will  be  processed  through  an  interpreter 
proaram  designed  especially  to  operate  on  queries  expressed  in 
the  CEL.  The  interpretive  approach  is  dictated  since  compilation 
of  queries  cannot  be  performed  rapidly  enough  to  permit,  an 
efficient  on-line  system. 

Many  information  retrieval  queries  will  be  of  a  form  in 
which  a  single  data  file  is  used  and  a  single  IF  statement  is 
sufficient  to  decide  upon  record  selection.  Often,  the  key  of 
the  record  will  be  given  so  the  desired  record  may  be  immediately 
retrieved.  If  the  key  is  not  given,  the  implication  is  that  each 
record  must  be  examined  for  its  compliance  with  the  query,  a 
process  considerably  shortened  by  the  use  of  inverted  indexes,  if 
they  exist.  Checking  for  the  existence  of  these  indexes,  and 
making  use  of  them,  is  a  function  of  the  interpreter. 

In  a  typical  query,  the  program  will  have  been  written 
in  skeleton  form,  and  the  remaining  data  is  acquired  at  the  time 
of  invocation.  The  items  acquired  are: 

o  Record  selection  criteria — a  single  IF 
statement,  although  containing  any  num¬ 
ber  of  clauses. 

o  The  "THEN"  f unctions--what  to  do  with  a 
selected  record,  e.g.,  RETRIEVE  items  A, 

B,  C,  retrieve  A  to  3(1,1)  ("retrieve 
item  A  and  place  it  in  record  I  of 
Buffer  File  1. ") 

o  The  "ELSE"  functions — iteration  logic 
will  be  built  into  the  original,  but  the 
user  can  add  functions.  He  may,  for 
example,  choose  to  retrieve  on  the  basis 
of  a  false  IF  condition. 


o 
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Additional  commands,  such  as  DEFER, 
SAVE,  etc. 
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VI 


PUBLICATIONS 


There  are  several  classes  of  publications,  with  fairly 
important  economic  differences  among  them.  There  will  be  a  class 
of  output  for  design  notes  and  change  notices.  This  class  will 
be  characterized  by  large  volume  and  high  frequency  of  issue, 
especially  early  in  a  system  development  cycle.  These  documents 
must  be  disseminated  fully  and  rapidly.  There  is  no  great  need 
for  many  of  the  niceties  of  publication  that  are  useful  in  other 
forms  of  documentation.  They  can  be  printed  at  the  consoles  used 
by  the  recipients,  or  they  can  be  batch  printed  on  a  high-speed, 
centralized  printer  and  disseminated  through  the  organization's 
regular  internal  mail  system. 

As  design  and  production  progress,  programmers, 
designers  anl  managers  will  want  fairly  complete  documents  on 
their  own  and  closely  related  programs  and  files.  These  will  be 
used  for  ready  reference,  and  possibly  for  making  notes  to  he 
used  later,  during  interrogations.  This  class  of  documentation 
is  characterized  by  larger  documents  of  lower  frequency  of  issue, 
but  probably  benefit.ting  from  more  careful  physical  layout  ind 
printinq.  They  will,  of  course,  change  often,  but  many  times  the 
holder  of  such  reports  can  attach  a  change  notice  directly  to 
this  report  copy,  or  make  a  hand-written  note  thereupon.  He  need 
not  reproduce  the  entire  report  every  time  there  is  a  change  to 
it. 


A  third  class  of  documentation  is  the  formal 
documentation  normally  produced  at  the  end  of  a  project,  or  for 
major  progress  or  milestone  reports.  These  are  printed  much  less 
often  than  the  others,  but  require  many  printinq  features  not 
always  available  on  computer-generated  documents. 

It.  appears,  at  this  point,  that  the  logical 
capabilities  represented  by  existinq  proqrams,  such  as  FLOWCHART 
[1]  and  Administrative  Terminal  System  [2j,  will  handle  most  of 
these  documentation  problems. 

ATS  offers  all  needed  features  except  ability  to  hanile 
graphics.  It.  offers  a  much  wider  choice  of  type  fonts  when 
printinq  at  a  terminal  with  changeable  type  elements,  and  the 
ability  to  underline  text.  Variation  in  type  fonts  for 
programming  documentation  is  useful  to  help  distinguish,  for 
example,  between  labels  or  data  names  and  normal  English  usage, 
as  SPEED  is  a  data  item,  but  speed  is  a  rate  of  motion. 

AUTOCHART  [4]  enables  the  entry  of  flow  charts  and 
tables.  It  is  designed  to  accept,  manually  prepared  input,  hence 
should  be  able  to  interface  smoothly  with  the  interrogation 
processor.  The  designer  does  his  own  flow  chart.  layout.  The 
compensation  for  the  extra  work  of  doing  this  is  a  compact  chart 
organized  in  the  most  meaningful  way,  to  the  author.  Tables  and 
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charts  can  be  modified  withont  complete  regeneration, 
updating  interrogation,  as  in  CAINT. 


using  an 
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VII 


FILE  PROCESSING 


1.  Requirements.  From  an  operational  viewpoint,  ESDP  imposes 
the  following  storage/retrieval  requirements: 

a.  Data  stored  on  or  transferred  to  bulk  storage  must 
be  directly  accessible  to  satisfy  a  broad  range  of  user  query 
requirements  and  data  storage  requirements  from  on-line  consoles. 


b.  Large  data  base  processing  capabilities  must  be 
provided  in  order  not  to  restrict  the  size  of  user  programming 
systems. 

c.  Evolutionary  file  growth  must  be  accommodated  since 
at  the  outset  of  the  programming  development  cycle  the  ESDP  files 
are  empty  and  evolve  as  the  user's  programming  system  develops. 


d.  Highly  variable  record  lengths  must  be  allowed 
since  these  are  dictated  by  the  varying  characteristics  of  the 
programs  comprising  the  user's  programming  system. 


e.  The  processing  cannot  rely  on  predetermined 
knowledge  of  the  distribution  of  search  keys,  used  in  accessing 
data,  since  these  are  dictated  by  the  symbol  coding  conventions 
adopted  for  the  user's  programming  system  and  by  his  natural 
language  responses  to  interrogation. 


f.  Certain  files  are  directly  related  to  others.  For 
example,  keywords  are  related  to  the  UOP  in  which  they  wer°  used. 
Therefore,  access  to  one  may  necessitate  access  to  the  other. 


The  ESDP  file  processor  addresses  these  requirements  and  attemnts 
to  provide  a  solution  that  effectively  handles  each  requirement 
within  the  total  context.  While  this  may  not  be  the  optimum 
solution  for  any  given  requirement,  when  considered  by  itself,  it 
does  cope  with  the  totality  of  requirements  in  an  effective 
fashion. 

A  total  file  management  or  information  processing 
system  was  not  considered  to  be  an  appropriate  development  based 
on  ESDP  requirements.  The  preferred  approach  was  to  develop  a 
set  of  generalized  modules  to  perform  discrete  functions  which 
would  be  usable  throughout  the  ESDP  system. 

Experimental  versions  of  the  file  processing  routines 
have  been  written  in  the  PL/I  language.  The  physical  file 
processing  uses  the  Basic  Direct  Access  Method  (BDA1)  [  *>  ]  through 
PL/I.  All  data  sets  are  physically  organized  by  regions,  where  a 
region  is  defined  as  a'  unit  of  storage,  equivalent  to  a  disk 
track.  This  equivalence  is  based  on  the  current.  PL/I 
implementation  and  may  vary  as  other  storage  devices  are 
supported  in  subsequent  implementations. 
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2.  Rationale  for  ESDP  Approach.  The  following  discussion  is 
limited  to  accessing  technigues  for  files  where  data  or  records 
must  be  directly  accessed.  It  excludes  technigues  which  rely  on 
sequential  data  organization  and  on  a  total  file  scan.  While  the 
latter  have  application  in  certain  classes  of  retrieval  problems, 
this  is  not  the  case  in  the  ESDP  system,  since  we  are  dealing 
with  a  large  data  base  and  a  non-batched  guery/retrieval 
en v ironraent . 

The  accessing  problem  is  one  of  uniquely  locating  each 
unit  of  data  within  a  file.  Two  general  techniques  can  be  used 
to  perform  this  location  function;  namely,  table  look-up  and 
randomization  techniques. 
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For  the  ESDP  system,  randomization  techniques  were 
rejected  as  the  bases  of  the  file  accessing  mode.  First,  as 
noted  earlier,  the  names  or  keys  used  in  ESDP  file  accessing  are 
dictated  by  the  symbol  coding  conventions  adopted  for  a  user’s 
programming  system  and  his  natural  language  responses  to 
interrogation.  They  cannot  be  predetermined.  No  known 
randomization  technique  exists  which  can  produce  satisfactory 
results,  given  any  key  set. 

Second,  randomizing  techniques  are  useful  only  ^or  a 
sinqle  access  path  to  file  data  (i. e. ,  access  through  a  single 
key  set).  Because  of  the  nature  of  the  data  in  the  ESDP  files, 
multiple  access  paths  must  be  available  to  the  same  data.  Thus, 
table  look-up  techniques  would  be  required  to  handle  the 
secondary  key  sets  and  access  paths. 

Ranlomizing  technigues  are  more  effective  in  loosely 
packed  file  situations.  Effic iency  drops  sharply  as  denser 
packing  is  used.  The  resultant  increase  in  storage  requirements 
cannot  be  offset  by  comparable  table  look-up  storage 
requirements.  Thus,  this  technique  would  unduly  tax  storage 
requirements.  File  maintenance  also  becomes  a  problem  if 
extensive  chains  develop. 
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The  alternative  to  randomization  is  some  form  of  table 
look-up  which  is  the  method  employed  in  the  ESDP  file  processor. 
Table  look-up  techniques  employing  indices  have  been  used  in  many 
other  systems  and  are  the  basis  of  the  index  sequential  access 
method  in  System/360.  Essentially,  a  table  entry  is  created  for 
each  name  or  key  used  in  accessing  a  file,  and  an  address  of  the 
appropriate  file  location  is  stored  with  the  key.  When  the  table 
is  searched,  the  required  storage  key  can  be  obtained  directly. 
Various  searching  algorithms  can  be  used  depending  on  the 
ordering  of  the  keys  in  the  table. 

The  most  efficient  searching  techniques  require  an 
ordered  (typically  alphabetic  sort  order)  table  based  on  key 
cha racteristics.  A  major  problem  arises  with  these  techniques 
when  applied  to  evolving  tables  or  indices.  Either  strict  order 
is  maintained  by  physically  rearranqing  the  index  when  new 
entries  are  inserted  or  chaining  techniques  are  used.  With  the 
latter  technique,  new  entries  are  not  inserted  in  sequence  but 
stored  separately  and  a  reference  inserted  at  the  required  point 
in  the  sequence.  To  avoid  extensive  chain  processing,  file 
maintenance  of  the  indices  is  periodically  required,  with  the 
frequency  of  the  period  dictated  by  the  index  qrowth  pattern. 

To  avoid  this  maintenance  and  reorganization  problem, 
the  ESDP  file  processor  uses  a  different  technique  for  index 
buildinq  and  searching,  which  is  a  take-off  on  existing  list 
processing  ideas. 

Tn  the  ESDP  file  system,  the  index  is  treated  as  a 
group  of  entries  which  are  physically  strung  together  into  a 
list,  not  necessarily  contiguously,  and  which  are  logically 
ordered  or  sequenced  by  the  use  of  pointers  or  address  indicators 
which  are  appended  to  each  entry.  Because  of  this  uncoupling  of 
the  physical  and  logical  ordering  of  the  index  (or  any  list),  we 
can  eliminate  the  index  reorganization  problem,  and  with  some 
other  simple  techniques,  the  index  maintenance  problem. 


A  binary  tree  structure  was  selected  to  permit 
efficient  search  strategies,  based  on  binary  search  techniques. 
The  form  of  the  index  entry  (or  structure  node)  adopted  for  the 
ESDP  case  is  shown  in  Figure  S.  Here: 


a.  The  Index  Key  Field  contains  the  key  or  name  used 
to  access  file  data.  This  field  contains  such  elements  as  the 
names  of  the  (1)  Units  of  Programming  (IJOP);  (2)  data  variable 
names;  or  (3)  descriptor  terms  (i.e.  ,  keywords). 

b.  The  Low  Sequence  Pointer  contains  the  address  of 
another  index  entry  whose  key  is  lower  in  sort  sequence  than  the 
key  of  the  record  being  examined.  Similarly,  the  High  Sequence 
Pointer  contains  the  address  of  a  record  whose  key  is  higher  in 
sort  sequence  than  the  key'  of  the  record  being  examined. 

c.  The  Data  Field  contains  any  additional  data  that  is 
desired  to  be  stored  in  the  index.  For  ESDP,  this  field  could  be 
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Index  Key  Field 


Low  Sequence  Pointer 


High  Sequence  Pointer 


Data  Field 


Figure  5.  Index  Entry 
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used  for  citation  lists,  disk  addresses  and  allocation  and 
addressing  controls. 

This  genera  1  form  was  defined  to  permit  the 
implementation  of  a  single  program  to  perform  index  building  and 
searching  of  a  variety  of  indices,  each  of  which  had  a  different 
snecific  organization.  As  typical  in  list  processing,  an  initial 
pointer  or  'anchor'  is  maintained  that  points  to  the  first  inlex 
entry  or  head  of  the  list. 

3.  Prototype  ES  DP  Index  Implementation.  Typically,  list 
processing  techniques  have  been  applied  to  lists  which  can  be 
maintained  in  core  memory.  For  the  ESDP  problem,  the  file  sizes 
and  index  requirements  are  too  large  to  justify  core  resident 
indices;  thus,  some  different  techniques  had  to  be  employed  which 
could  operate  with  a  disk  resident  index.  First,  indices  were 
segmented  an  1  these  segments  were  the  units  for  storinq  and 
retrieving  from  disk.  The  selected  segment  size  was  set  at  the 
track  size  of  the  disk  unit  used.  The  following  PL/I  structure 
declaration  defines  the  segment  format  used: 

SEGMENT  FORMAT 
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ANC  is  a  pointer  that  indicates  which  entry  in  the  segment 
can  be  used  for  the  next  index  term  to  be  stored.  If  the 
value  of  ANC  is  zero  then  the  seqment  has  the  maximum  num¬ 
ber  of  entries  or  is  full. 


Each  index  segment  must  be  initialized  before  operation  as 
follows: 


a.  Each  index  field  (KEE)  is  set  to  binary  zero. 

b.  Each  high  sequence  pointer  is  set  to  binary  zero. 

c.  Each  low  sequence  pointer  is  set  with  the  subscript 
value  of  the  next  entry  in  the  array  (MINSTRUCT).  LOW 
(N)  is  set  to  binary  zero. 
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d.  ANC  is  given  the  value  (J71)  (i.e.  ,  initially  the 

first  segment  entry  is  to  be  used  for  the  first  index 
entry  in  the  segment) . 


A  generalized  index  search  routine  has  been  written 
which  has  multiple  entry  points  depending  on  the  number  of 
discrete  indices.  (At  present,  eight  indices  are  maintained;  six 
corresponding  to  the  six  (JOP  levels,  one  for  data  definitions  and 
one  for  keywords.)  Upon  access,  this  routine  reads  the  first 
segment  of  the  specified  index,  which  contains  the  anchor  or 
start  of  the  index  list.  It  begins  comparing  the  passed  search 
key  against  the  KEE's  (stored  keys)  in  the  segment.  If  this  is 
the  first  entry  in  the  index,  then  KEE  (1)  in  segment  one  equals 
binary  zero  and  no  match  is  found.  The  no  match  occurs  whenever 
the  contents  of  KEE  differ  from  the  passed  search  key.  If  the 
passed  key  and  the  entry  in  KEE  match,  then  the  routine  returns 
to  the  calling  program  and  passes  back  the  subscript  value  of  the 
matching  entry  and  disk  address  of  the  index  segment,  currently 
in  core  memory. 
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b.  If  it  is  less  than  a  threshold  value,  then  the 
value  is  a  subscript  to  an  entry  within  the  index  segment, 
currently  in  core  memory.  The  threshold  is  current  set  at  25S 
which  is  the  maximum  number  of  index  entries  permitted  in  a  given 
segment.  The  search  program  uses  the  subscript  to  pick  up  the 
next  entry  and  repeat  the  comparison  operation. 

c.  If  it  is  greater  than  the  threshold,  then  the  value 
has  a  double  meaning;  namely,  it  contains  the  disk  address  of  the 
index  seqment  in  which  the  next  entry  can  be  found  and  the 
subscript  value  of  that  entry  within  the  segment.  The  search 
program  uses  the  disk  to  '  overwrite  the  current  core  resident 
segment  with  the  new  segment.  The  subscript  value  is  then  used 
to  pick  up  the  desired  entry  and  repeat  the  comparison  loop. 
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Thus,  two  returns  from  the  search  routine  are  possible, 
either  a  match  or  a  no  match.  In  the  match  case,  the  calling 
program  performs  whatever  processing  is  required  using  the  index 
entry  and  rewrites  the  index  segment  to  disk  upon  completion,  if 
the  specified  index  entry  has  been  modified.  In  the  no-match 
case,  either  an  error  condition  exists  or  the  calling  program 
wants  to  add  a  new  index  entry.  In  the  former  case,  some 
appropriate  error  processing  should  be  performed.  In  the  latter 
case,  i.e.,  index  entry  load,  the  calling  orogram  is  responsible 
for  finding  an  empty  slot  that  can  be  used  for  the  new  index 
entry.  To  do  this,  the  AMC  Field  of  the  index  segment  is  used 
since  it  points  to  the  next  available  slot  in  the  segment. 

If  the  ANC  value  for  the  index  segment,  currently  in 
core  memory,  is  non-zero,  then  the  new  entry  can  be  inserted  in 
the  current  segment  and  the  ANC  value  is  the  subscript  to  this 
available  space.  Before  using  the  indicated  entry  space,  the 
calling  program  must  replace  the  current  ANC  value  by  the  value 
of  the  LOW  pointer  in  the  indicated  entry  space.  Thus,  for 
subsequent  users,  ANC  will  have  an  appropriate  subscript  value 
and  continue  to  point  to  the  next  available  entry.  The  new  entry 
is  initialized  as  required  by  the  calling  program  and  both  LOW 
and  HIGH  pointers  are  set  to  zero,  making  the  new  entry  a 
terminal  node.  The  returned  back  pointer  value  is  used  to  make 
the  necessary  linkage  with  the  last  compared  key  to  preserve  the 
logical  ordering  of  the  index. 

When  the  value  of  ANC  in  the  current  core  resident 
segment  equals  zero,  then  the  current  segment  is  full  and  cannot 
hold  a  new  index  entry.  Since  the  LOW  pointer  of  the  last  entry 
in  each  segment  is  initialized  to  zero,  when  this  entry  is  used, 
ANC  will  pick  up  a  zero  value.  In  this  case,  empty  space  must  be 
found  in  some  other  segment  of  the  index.  Segments  of  the  inlex 
are  retrievel  sequentially  until  a  segment  is  found  whose  \nc 
value  is  non-zero.  Note,  the  starting  point  for  segment- 
retrieved  is  specified  by  a  system  parameter,  like  the  anchor 
pointer,  which  gives  the  disk  address  of  the  first  segment  of  the 
index  which  has  empty  space.  As  an  index  is  initially  built, 
this  address  will  click  up  sequentially;  however,  whenever  an 
index  entry  is  deleted,  thus,  creating  a  hole  in  the  index,  this 
address  will  be  reset  to  the  segment  from  which  the  deletion  was 
made.  Thus,  empty  space  will  be  reused. 


35 


VIII 


DEBUGGING  SUPPORT 


1.  Introduction.  He  have  described  in  Volume  3  of  this  report  a 
language  designed  to  facilitate  programming  of  conversational 
processes.  This  language,  called  the  CAINT  Executive  Language 
(CEL)  is  employed  in  writing  executive  programs  to  control  the 
logic  of  guestion  selection  and  wording,  response  analysis,  and 
various  other  activities.  CEL-written  programs  require 
debugging,  and  ESDP  is  designed  to  include  a  support  package 
specifically  tailored  to  assist  the  executive  programmer  in 
debugging. 

2.  Debuqcjinjj.  Support  Capabilities  .  He  define  five  basic 
debugging  capabilities:  (1)  halt,  (2)  display,  (3)  alter,  (4) 
change,  and  (5)  begin.  The  capabilities  will  be  presented  by 
commands  of  the  same  name,  and  it  will  be  possible  to  embed  the 
commands  in  CEL. 

For  example, 

IF  (Condition  )  THEN  DISPLAY  (Variable)  ELSE  HALT. 

3.  Commands. 

a.  HALT 

Halt  the  CEL  program  and  continue  with  the  next 
sequential  instruction  when  the  start  button  on  the  user's 
console  is  pressed. 

b.  DISPLAY 

Display  the  contents  of  name!  variables  (in  core  or 
external  storage) . 

For  example, 

A  =  1;  B  =  6;  C  =  A  +  B; 

DISPLAY  (C)  ; 

results  in  a  printout  of: 

C  =  7 

c.  ALTER 

Enables  the  programmer  to  modify  some  area  of  core  or 
external  storage. 

For  example. 
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Programmer  types: 

C  =  A  -  B  (continuing  the  above  example) 

C  is  set  equal  to  -5. 

<i.  CHANGE 

Enables  the  programmer  to  change  statements  in  the  CEL 
program  being  debugged. 

There  are  three  forms  of  CHANGE  defined: 

(1)  CHANGE  (DELETE  (statement  number)  TO 
(statement  number) ) 

(2)  CHANGE  (REPLACE  (statement  number)  TO 
(statement  number)  WITH  source  code) 

(3)  CHANGE  (INSERT  AFTER  (statement  number) 
source  code  ) 

4.  Use  of  Debugging  Capabilities.  Data  value  changes  may  be 
traced  throughout  the  execution  of  a  program. 

For  example, 

DISPLAY  (X)  ; 

means  display  the  current  value  of  X  every  time  X 
changes  value. 

IF  (Y  >  5)  6  (Y  <10)  THEN  DISPLAY  (X)  ; 

means  display  the  current  value  of  X  every  time  X 
changes  only  if  Y  is  greater  than  5  but  less  than  10.  In  other 
words,  rather  than  inserting  this  statement  in  the  CEL  proqram 
every  time  that  X  is  changed,  the  programmer  can  state  the 
condition  an  1  desired  action  once,  and  the  command  will  be  in 
effect  throughout  the  program  execution. 

Another  option  is  deferral  of  printout. 

For  example, 

DISPLAYD  (X)  ; 

means  record  all  value  changes  of  X  and  print  off-line. 

A  particular  CAINT  application  might  be  a  deferred  display  of  all 
the  question  (in  sequence)  output  for  a  given  run. 

During  a  debug  run  some  means  of  data  base  protection 
would  have  to  be  provided.  This  might  take  the  form  of  putting 
the  data  base  in  a  read-only  mode  for  each  debug  application. 
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This  would  mean  that  any  area  of  the  data  base  could  be  read,  but 
writing  woull  be  directed  to  a  scratch  file,  and  any  attempts  to 
read  "changed”  data  base  records  would  also  be  directed  to  the 
scratch  file  (onto  which  they  had  previously  been  written). 


following 

A  typical  CHANGE 
detection  of  a  bug. 

and 

ALTER 

situa  tion 
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occur 

insert  ion 

A  CHANGE  command 
of  statements) 

(for 

could 

replacement , 
be  used  to 

deletion 

attempt 

,  or 

error 

correction.  An  ALTER  command  could  then  be  used  to  reinitialize 
data  variables  to  reasonable  values  for  the  point  at  which  the 
program  is  restarted  (using  the  BEGIN  command). 

S.  Net  hods  of  Implementation.  There  are  two  general  methods  to 
support  the  CEL  with  the  debugging  capabilities  described  above: 
compilation  and  interpretation. 

Some  pre-processing  would  be  required  to  prepare  a  deck 
for  compilation  so  as  to  support  the  debugging  capabilities 
described  above.  This  could  be  coupled  with  an  interrogation 
designed  to  elicit  from  the  programmer  the  specific  debugging 
requirements  for  each  program. 

Consider  a  simple  example  of  the  kind  of  preprocessing 
under  discussion. 


The  programmer  writes  the  following  code  in  which  the 
numbers  on  the  left  are  machine  generated  statement  numbers,  to 
be  used  by  the  programmer  as  operands  of  a  CHANGE  command. 


S00100 
S00  200 
S00  400 
S00S00 
S00700 
S00  800 
S00  900 
S01000 


LI:  DO; 

IF  UOP.NUMENS  =  &  THEN  CALL  OUT  ('NO  MEMBERS' , 
UOPAD)  ;  ELSE  CALL  OUT  (ME  M  LIST  ,  NU  ME  NS )  ; 

IF  UOPAD  =  U  OP  END  THEN  GO  TO  L2; 

UOPAD  =  UOPAD  +  1; 

CALL  NEXT  (UOPAD)  ; 

GO  TO  LI; 

L2 :  END 


The  following  is  an  example  of  a  dialoque  that  could 
then  take  place: 

HSG  Type  the  number  of  each  command  to  he  used 

1.  HALT 

2.  DISPLAY 

3.  ALTER 

RES  2 


MSG  Which  variables  are  to  be  displayed? 


38 


RES 

HOP. NUMENS 

MSG 

Which  variables  are  to  control 

the  display 

of 

UOP. NUMENS? 

RES 

UOP. NUMENS 

MSG 

For  which  new  values  of 
displayed? 

UOP. 

NUMENS  is 

UOP. 

NUMENS  to  be 

RES 

UOP. NUMENS  >  H 

MSG 

Pre-processing  has  begun 

MSG 

Compilation  has  begun 

MSG 

At  which  statement  do  you 
Number  1  -  10)  . 

want 

execution 

to 

begin?  (Type 

RES 

1 

MSG 

Execution  has  begun 

Etc. 


The  deck  produced  by  the  pre- processor  looks  as 

follows: 

LlL  ?:  PROC  OPTIONS  (MAIN)  ; 


^INCLUDE  DATA  (DCLl)  ;  Includes  declarations  necessary  to  define 

data  base  references  in  program. 

^INCLUDE  TEMP  (CODE) ;  Includes  code  to  initialize  variables  in 

support  of  DISPLAY.  (This  code  was 
generated  as  a  result  of  the  dialogue 
pictured  above.) 

^INCLUDE  PROS  (CHECK) ;  Includes  this  system  (ESDP)  program  as 

internal  orocodure.  This  program 

supports  the  debugging  capabilities 
described  above. 


LABI  (1)  : LI: DO; 


LABI  (1)  defines  th 
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LABI  (2)  :  IF  UOP. NUMENS  =  fl 


THEN 


LABI  (3):  CALL  OUT  (10  MEMBERS, 
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LABI  (4)  : 


ELSE  CALL  OOT  (MENLIST, 


NUMENS)  ; 

CALL  CHECK: 

LAB1(S):  IF  UOPAD  =  UOPEND 

THEN 

LABI  (6)  :  GO  TO  L2; 

LABI  (7)  :  UOPAD  =  UOPAD  +  1; 

CALL  CHECK: 

LAB  (8)  :  CALL  NEXT  (UOPAD); 

CALL  CHECK; 


LABI  (4)  :  GO  TO  LI ; 
LAB  1(10) :  L2:  END; 


Tn  summary: 

The  HALT  command  could  have  been  enabled  by  the  same 
interrqation  process  which  enabled  the  DISPLAY  command.  The 
program  can  still  be  halted  by  the  programmer  by  pressing  the 
stop  button  on  the  console. 

The  CHANGE  command  can  be  utilized  following  a  halt  by 
means  of  a  dialogue  with  the  CHECK  routine. 

The  DISPLAY  command  has  been  selectively  enabled 
through  interroaation. 

The  ALTER  command  could  have  been  enabled  by  means  of 
interrogation. 

The  BEGIN  command  is  supported  by  embedding  the  program 
in  a  label  array  as  shown. 

The  main  differences  between  compilation  and 
interpretation  are  (1)  interpretation  will  effect  some 
implementation  costs  whereas  the  compiler  is  essentially  free, 

(2)  interpretive  program  CHANGES  can  be  made  more  rapidly, 
because  it  is  not  necessary  to  recompile  and  linkage  edit,  and 

(3)  interpretive  debugging  commands  can  be  entered  at  execution 
time. 
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